Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Redis] message lost prevention #757

Closed
wants to merge 10 commits into from
Closed

Conversation

cshum
Copy link
Contributor

@cshum cshum commented Feb 5, 2019

Per #750, the main idea of changes is to swap BRPOP/RPOP into BRPOPLPUSH/RPOPLPUSH and make sure writes happen before deletes:

  • On receiveMessage, BRPOPLPUSH/RPOPLPUSH fetches result, also move it into :processing list
  • At the end processResult after ZADD, LREM to remove :processing messages
  • If worker crashed/connection drops between receiveMessage and processResult, messages will still left intact under the :processing list
  • When all jobs clear and queue empty, RENAMENX to migrate :processing list into the main queue.

There are still stuffs need to be done. But the current version is running on our production, and seems to be working so far.
With the revised design, tradeoff would be message duplications. If worker jobs are idempotent then this will not be a concern.
To cope with duplicated issue, can keep a hash set of uuids and check against it. The hash set can be deleted when queue becomes empty.

Plans/Todo:

  • redis interface add methods
    • brpoplpush
    • rpoplpush
    • lrem
    • renamenx
  • predis driver added methods
  • phpredis driver added methods
  • disable RedisSubscriptionConsumer, as BRPOPLPUSH does not support multiple queues, not sure how to fix this.
  • Refractor receiveMessage:
    • brpoplpush/rpoplpush from queue to :processing
    • zadd reserved
    • lrem queue:processing
  • Refractor requeue:
    • swap LPUSH to be called before ZREM
  • on queue empty, renamenx :processing back to main queue
  • messages duplicates detection (using hash set of uuids)
  • on queue empty, clear duplication hash set
  • Fix/add test cases

@makasim
Copy link
Member

makasim commented Feb 19, 2019

This could not be merged since it introduces even worse scenarios. With this change, it would not work with more than one consumer at all.

for example here you migrate all message and that would work for one consumer but fail for more.

@makasim makasim closed this Feb 19, 2019
@makasim
Copy link
Member

makasim commented Feb 19, 2019

If you need strict message delivery guarantee than I'd suggest you to use a real broker such as RabbitMQ or AmazonSQS.

We've investigated the issue at our end looking for a solution to fix it. Unfortunately, we've not come with any better approach. The current one is not ideal but it remains the best one

@cshum
Copy link
Contributor Author

cshum commented Feb 19, 2019

This could not be merged since it introduces even worse scenarios. With this change, it would not work with more than one consumer at all.

for example here you migrate all message and that would work for one consumer but fail for more.

https://redis.io/commands/renamenx would work for multiple consumer isn't it? If 1 consumer successful migrate messages, other consumers trying to migrate would result returns an error when key does not exist or 0 if newkey already exists..

With the added logic its probably slower and have duplicated messages. In our production we have been running this for a while with ~15 workers, no message lost so far (given redis is not down)

@cshum
Copy link
Contributor Author

cshum commented Feb 19, 2019

If you need strict message delivery guarantee than I'd suggest you to use a real broker such as RabbitMQ or AmazonSQS.

We've investigated the issue at our end looking for a solution to fix it. Unfortunately, we've not come with any better approach. The current one is not ideal but it remains the best one

I would suggest dropping this library altogether for next project.

When someone picks a message queue library they are probably expecting at least once delivery. Any pitfalls should have been mentioned in readme and drop the “battle tested, production ready” bs.

@makasim
Copy link
Member

makasim commented Feb 19, 2019

Just a friendly reminder, you got a couple of years of several developers work for free

@makasim
Copy link
Member

makasim commented Feb 19, 2019

and I have to spend my free unpaid time to review your PR and dig into details, which is not always easy and takes a considerable amount of time.

@makasim
Copy link
Member

makasim commented Feb 19, 2019

You could release your version as a standalone package. I would gladly accept a PR for the doc that describes current implement limitations and possible message lost. It could contain a link to your package too.

If time proves your approach is better it could be merged here.

For now, I don't have enough resources to go through this PR and be sure it does not break anything

@cshum
Copy link
Contributor Author

cshum commented Feb 19, 2019

Thanks. The PR is clearly unfinished and tests are failed, of course I am not expecting this to be merged any time soon. Neither do I have time to polish it good enough for general release.

We had problems and we have attempted to fix it. I just though it was a good idea to share the progress and directions though. The source here is exactly the same as what we are currently running.

@makasim
Copy link
Member

makasim commented Feb 21, 2019

@cshum fyi #770

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants